Create Data Frame From Hive Table

Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Using HiveContext, you can create and find tables in the HiveMetaStore and write queries on it using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. 

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables using SparkSession.
  • Create DataFrame from existing Hive table.
  • Save DataFrame to a new Hive table
  • Append data to the existing Hive table via both INSERT statement and append write mode. 
Create dataframe from Hive table using sql query

val df = spark.sql("SELECT * from emp")
df.show

val df = spark.sql(" SELECT * from emp where job='SALESMAN' ")
df.show 


Create dataframe from Hive table using Table
val df = spark.table("emp")
df.show



val df = spark.table("emp")
df.printSchema

 
Get schema with structTypes
df.schema


 
We can direct run sql queries on the file.
spark.sql(" select * from parquet.`/Data/names.parquet` ").show

No comments:

Post a Comment